Modern distributed cyber-physical systems encounter a large variety ofanomalies and in many cases, they are vulnerable to catastrophic faultpropagation scenarios due to strong connectivity among the sub-systems. In thisregard, root-cause analysis becomes highly intractable due to complex faultpropagation mechanisms in combination with diverse operating modes. This paperpresents a new data-driven framework for root-cause analysis for addressingsuch issues. The framework is based on a spatiotemporal feature extractionscheme for multivariate time series built on the concept of symbolic dynamicsfor discovering and representing causal interactions among subsystems of acomplex system. We propose sequential state switching ($S^3$) and artificialanomaly association ($A^3$) methods to implement root-cause analysis in anunsupervised and semi-supervised manner respectively. Synthetic data from caseswith failed pattern(s) and anomalous node are simulated to validate theproposed approaches, then compared with the performance of vectorautoregressive (VAR) model-based root-cause analysis. The results show that:(1) $S^3$ and $A^3$ approaches can obtain high accuracy in root-cause analysisand successfully handle multiple nominal operation modes, and (2) the proposedtool-chain is shown to be scalable while maintaining high accuracy.
展开▼
机译:现代分布式网络物理系统遇到各种各样的异常情况,并且在许多情况下,由于子系统之间的强大连接性,它们很容易遭受灾难性错误传播情况的影响。在这种情况下,由于复杂的故障传播机制与各种操作模式相结合,根本原因分析变得非常棘手。本文提出了一种用于根源分析的新的数据驱动框架,用于解决此类问题。该框架基于多元时间序列的时空特征提取方案,该方案建立在符号动力学概念的基础上,用于发现和表示复杂系统子系统之间的因果相互作用。我们提出了顺序状态切换($ S ^ 3 $)和人工异常关联($ A ^ 3 $)方法,分别以无监督和半监督的方式实现根本原因分析。模拟具有失败模式和异常节点的案例的综合数据以验证所提出的方法,然后将其与基于矢量自回归(VAR)模型的根本原因分析的性能进行比较。结果表明:(1)$ S ^ 3 $和$ A ^ 3 $方法在根本原因分析中可以获得较高的准确性,并且可以成功处理多种名义操作模式;(2)所提出的工具链具有可扩展性,而保持高精度。
展开▼